Protein families and TRIBES in genome sequence space.

نویسندگان

  • Anton J Enright
  • Victor Kunin
  • Christos A Ouzounis
چکیده

Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The properties of protein family space depend on experimental design

MOTIVATION Databases of protein families often exhibit drastically different properties of the protein family space. RESULTS We compared the properties of protein family space as reflected by exhaustive protein family databases and databases with predefined families. We used TRIBES, Protomap, ProDom and COGs as representatives of the exhaustive databases, and Pfam-A and Superfamily as databas...

متن کامل

Protein Energy Malnutrition in preschool tribe’s children of Chhattisgarh

Objective: The aim of this study was to find out the prevalence of protein energy malnutrition(PEM) by anthropometric measurements in preschool children of tribal community of Chhattisgarh. Material and Methods: A total of 449 children [237 boys and 212 girls] from 286 families were selected randomly. Anthropometric measurements were done as per standard protocol. The level of underweight, stun...

متن کامل

Physical organization of the 1.709 satellite IV DNA family in Bovini and Tragelaphini tribes of the Bovidae: sequence and chromosomal evolution.

Repetitive DNA in the mammalian genome is a valuable record and marker for evolution, providing information about the order and driving forces related to evolutionary events. The evolutionarily young 1.709 satellite IV DNA family is present near the centromeres of many chromosomes in the Bovidae. Here, we isolated 1.709 satellite DNA sequences from five Bovidae species belonging to Bovini: Bos ...

متن کامل

Molecular detection of proteolytic activity of human parechovirus 2A protein by gene expression

  Parechoviruses form one of the nine genera in the picornaviridae family, and include two human pathogens: Human parechovirus type1 and 2 (Hpev1 and Hpev2). The genome of picornaviruses encodes a single polyprotein, which undergoes a cleavage cascade performed by virus encoded proteases to give the final virus proteins. The primary cleavage occurs by 2A protein and this step is critical for vi...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nucleic acids research

دوره 31 15  شماره 

صفحات  -

تاریخ انتشار 2003